Automatically Predicting Sentence Translation Difficulty

نویسندگان

Abhijit Mishra

Pushpak Bhattacharyya

Michael Carl

چکیده

In this paper we introduce Translation Difficulty Index (TDI), a measure of difficulty in text translation. We first define and quantify translation difficulty in terms of TDI. We realize that any measure of TDI based on direct input by translators is fraught with subjectivity and adhocism. We, rather, rely on cognitive evidences from eye tracking. TDI is measured as the sum of fixation (gaze) and saccade (rapid eye movement) times of the eye. We then establish that TDI is correlated with three properties of the input sentence, viz. length (L), degree of polysemy (DP) and structural complexity (SC). We train a Support Vector Regression (SVR) system to predict TDIs for new sentences using these features as input. The prediction done by our framework is well correlated with the empirical gold standard data, which is a repository of < L,DP, SC > and TDI pairs for a set of sentences. The primary use of our work is a way of “binning” sentences (to be translated) in “easy”, “medium” and “hard” categories as per their predicted TDI. This can decide pricing of any translation task, especially useful in a scenario where parallel corpora for Machine Translation are built through translation crowdsourcing/outsourcing. This can also provide a way of monitoring progress of second language learners.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Locating and Reducing Translation Difficulty

The challenge of translation varies from one sentence to another, or even between phrases of a sentence. We investigate whether variations in difficulty can be located automatically for Statistical Machine Translation (SMT). Furthermore, we hypothesize that customization of a SMT system based on difficulty information, improves the translation quality. We assume a binary categorization for phra...

متن کامل

Automatic sentence segmentation and punctuation prediction for spoken language translation

This paper studies the impact of automatic sentence segmentation and punctuation prediction on the quality of machine translation of automatically recognized speech. We present a novel sentence segmentation method which is specifically tailored to the requirements of machine translation algorithms and is competitive with state-of-the-art approaches for detecting sentence-like units. We also des...

متن کامل

Estimating the Sentence-Level Quality of Machine Translation Systems

We investigate the problem of predicting the quality of sentences produced by machine translation systems when reference translations are not available. The problem is addressed as a regression task and a method that takes into account the contribution of different features is proposed. We experiment with this method for translations produced by various MT systems and different language pairs, ...

متن کامل

Correlating decoding events with errors in Statistical Machine Translation

This work investigates situations in the decoding process of Phrase-based SMT that cause particular errors on the output of the translation. A set of translations postedited by professional translators is used to automatically identify errors based on edit distance. Binary classifiers predicting the sentence-level existence of an error are fitted with Logistic Regression, based on features from...

متن کامل

Statistically Motivated Example-based Machine Translation using Translation Memory

In this paper we present a novel way of integrating Translation Memory into an Example-based Machine Translation System (EBMT) to deal with the issue of low resources. We have used a dialogue of 380 sentences as the example-base for our system. The translation units in the Translation Memories are automatically extracted based on the aligned phrases (words) of a statistical machine translation ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Automatically Predicting Sentence Translation Difficulty

نویسندگان

چکیده

منابع مشابه

Locating and Reducing Translation Difficulty

Automatic sentence segmentation and punctuation prediction for spoken language translation

Estimating the Sentence-Level Quality of Machine Translation Systems

Correlating decoding events with errors in Statistical Machine Translation

Statistically Motivated Example-based Machine Translation using Translation Memory

عنوان ژورنال:

اشتراک گذاری